Introduction to Causal Inference

Sheng Cao

Introduction

What is causal inference?

Cause-and-effect questions: Does X cause Y ? If X causes Y, how large is the effect of X on Y? Is the size of this effect large relative to the effects of other causes of Y? (Morgan and Winship 2014)
- E.g. Did mandatory busing programs in the 1970s increase the school achievement of disadvantaged minority youth? If so, how much of a gain was achieved? (Morgan and Winship 2014)
- E.g. Does obtaining a college degree increase an individual’s labor market earnings? If so, is this particular effect large relative to the earnings gains that could be achieved only through on-the-job training? (Morgan and Winship 2014)

Why is it important?

It is essential to science: If we are choosing between treatments for a disease, we want to choose the treatment that causes the most people to be cured, without causing too many bad side effects. (Neal 2020)
It is essential for rigorous decision-making: We are considering several different policies to implement to reduce greenhouse gas emissions, and we must choose just one due to budget constraints. If we want to be maximally effective, we should carry out causal analysis to determine which policy will cause the largest reduction in emissions. (Neal 2020)

Background

Correlation is not Causation

How wearing shoes to bed and headaches are associated without either being a cause of the other? It turns out that they are both caused by a common cause: drinking the night before. The total association observed can be made up of both confounding association and causal association. It could be the case that wearing shoes to bed does have some small causal effect on waking up with a headache. Then, the total association would not be solely confounding association nor solely causal association. It would be a mixture of both. (Neal 2020)

Directed Acyclic Graph (DAG)

Definition: A graph is a collection of nodes (also called “vertices”) and edges that connect the nodes. A directed path is a path that consists of directed edges that are all directed in the same direction. If there are no cycles, which are directed paths from some node X back to itself, in a directed graph, the graph is known as a directed acyclic graph (DAG). (Neal 2020)

Causal Chain: In chain graphs, \(X_1\) and \(X_3\) are usually dependent simply because \(X_1\) causes changes in \(X_2\) which then causes changes in \(X_3\). \(X_2\) can be called the mechanism. (Neal 2020)
- Controlling for \(X_2\) prevents information about \(X_1\) from getting to \(X_3\) or vice versa. (Pearl 2019) So, we cannot control for \(X_2\) when exploring the causal effect of \(X_1\) on \(X_3\).

Fork (Confounding Junction): Confounding bias occurs when a variable influences both who is selected for the treatment and the outcome of the experiment. In the figure, q is a confounder of X and Y. (Pearl 2019)
- Q is a source of omitted variable bias. If we omit q in the regression of Y on X, it will not only pick up the association between X and Y, but also the association between X and Y that flows through q. Therefore, we should control for q when exploring the causal relationship between X and Y.

Collider (M-bias): In a collider graph, The variables X and Y start out independent, so that information about X tells you nothing about Y. But if you control for z, then information starts flowing through the “pipe”, due to the explain-away effect. (Pearl 2019) Therefore, we cannot control for z when exploring the causal relationship between X and Y.

Fundamental Rule: Controlling for descendants (or proxies) of a variable is like “partially” controlling for the variable itself. Controlling for a descendant of a mediator partly closes the pipe; controlling for a descendant of a collider partly opens the pipe. (Pearl 2019)

The Fundamental Problem of Causal Inference

It is impossible to observe all potential outcomes for a given individual(Rubin 1974), which is defined by Holland as the fundamental problem of causal inference(Holland 1986).
- E.g. You could observe Y(1) by getting a dog and observing your happiness after getting a dog. Alternatively, you could observe Y(0) by not getting a dog and observing your happiness. However, you cannot observe both Y(1) and Y(0), unless you have a time machine that would allow you to go back in time and choose the version of treatment that you didn’t take the first time. (Neal 2020)

Experimental methods

Definition

The individuals or material investigated, the nature of the treatments or manipulations under study and the measurement procedures used are all selected, in their important features at least, by the investigator. (Cox and Reid 2000)
- E.g. “Treatment group” receives the medicine being evaluated. “Control group” receives a harmless/ineffective substitute (placebo). Each subject is assigned at random to either the treatment or the control condition.

Randomized Controlled Trial (RCT)

This is a type of experiment in which participants are randomly assigned to an intervention group (that receives a treatment or intervention) or a control group (that does not receive the treatment).
- In policy-related RCTs, “treatment” is equal to the exposure to some policy/program (e.g., job training program, income supplement program, sex education program, etc)

Pros and Cons of RCT

Pros: Association is causation. Randomized experiments guarantee that there is no confounding. As the Directed Acyclic Graph (DAG) shows, X is a confounder of the effect of T on Y. Non-causal association flows along the backdoor path T ← X → Y. However, if we randomize T, T no longer has any causal parents. This is because T is purely random. (Neal 2020)
Cons: RCTs may be expensive and time-consuming to implement. Randomization requires implementation of a carefully designed and closely monitored experiment - probably on a small scale (due to cost considerations) and or for a narrowly defined group (due to practical constraints) - which can limit the study’s external validity.

Observational methods

Definition

In an observational study some of these features, and in particular the allocation of individuals to treatment groups, are outside the investigator’s control. (Cox and Reid 2000)

Quasi-experiment

We often rely on variation in policy-relevant variables that is “as-if” random since experimental methods are expensive, time-consuming and have external validity problems.
Quasi-experiments use observational (rather than experimental) data.

Quasi-experimental Method 1: Instrumental Variables

Definition: The information about the movements in X that are uncorrelated with (the error term) is gleaned from one or more additional variables, called instrumental variables or simply instruments. Instrumental variables regression uses these additional variables as tools or “instruments” to isolate the movements in X that are uncorrelated with (the error term)… (Stock and Watson 2020)
Example: (Miguel, Satyanath, and Sergenti 2004) This paper is estimating the impact of economic conditions on the likelihood of civil conflict. The author uses rainfall variation as an instrumental variable for economic growth in 41 African countries during 1981-1999. Weather shocks are plausible instruments for growth in gross domestic product in economies that largely rely on rain fed agriculture, that is, neither have extensive irrigation systems nor are heavily industrialized. Sub-Saharan Africa is the ideal region for this identification strategy: the World Development Indicator database indicates that only 1 percent of cropland is irrigated in the median African country, and the agricultural sector remains large.

Quasi-experimental Method 2: Regression Discontinuity

Definition: For some programs, there is a clear eligibility threshold (cutoff): if you are below (above) that threshold, you qualify for the program and, if you are above (below) the threshold, you don’t. RD takes advantage of the cutoff. As long as the only thing that changes at the cutoff is that the person gets the treatment, then any jump up or down in the dependent variable at the cutoff will reflect the causal effect of treatment. (Bailey 2019)

Example: (Gormley and Gayer 2005) The author explores the effect of Oklahoma’s universal pre-kindergarten (pre-K) program for four-year-olds on children’s test scores in Tulsa Public Schools. The author uses a regression-discontinuity approach that contrasts the performance in the test of children born just before the cut-off date (the treatment group) to the performance of children born just after the cut-off date (the control group), at the same time controlling for continuous age effects.

Quasi-experimental Method 3: Difference in Difference

Definition: To control for systematic differences between the control and treatment groups, we need two years of data, one before the policy change and one after the change. Thus, our sample is usefully broken into four groups: the control group before the change, the control group after the change, the treatment group before the change, and the treatment group after the change. (Wooldridge 2016) By having two before-after comparisons, we create a much stronger study: a difference-in-difference study. The study gets its name from the fact that it compares the differences between two before-after differences. (Remler and Van Ryzin 2015)

Average Outcome Before After Difference

No Treatment A B B-A

Treatment C D D-C

Difference C-A D-B ((D-C)-(B-A))=((D-B)-(C-A))

Average Outcome	Before	After	Difference
No Treatment	A	B	B-A
Treatment	C	D	D-C
Difference	C-A	D-B	((D-C)-(B-A))=((D-B)-(C-A))

Example: (Eissa and Liebman 1996) This paper examines the impact of the Tax Reform Act of 1986 (TRA86) on the labor force participation and hours of work of single women with children. This paper uses all single women with children as the control group. The difference between the change in labor force participation of single women with children and the change of single women without children is the estimate of the effect of TRA886 on participation. This is essentially the difference-in-differences approach.

Quasi-experimental Method 4: Fixed Effects

Definition: If we include fixed effects in our regression model, we are controlling for every single fixed characteristic of every single member of our sample, including characteristics for which we don’t even have data.
Example: (Waldfogel 1997) This paper is trying to answer the question: Does having children affect the wages of women in the United States? If women with lower motivation to succeed in the labor market are more likely to have children and less likely to have high earnings, then this unobserved heterogeneity might explain the observed negative relationship of wages and having children. The author controls for this heterogeneity via fixed effects and first-differences models.

Quasi-experimental Method 5: Propensity Score Matching

Definition: Assume that we want to measure the effect of some policy “treatment,” but that people aren’t randomly assigned to this treatment. We could address selection bias by accounting for observed characteristics that are part of the “selection process” via stratification or multiple regression or use same set of variables to model the selection process explicitly, which is the approach taken by propensity score matching.
Example: (Harding 2003) This paper is trying to explore the effect of growing up in a high-poverty neighborhood on the probability of dropping out of high school or of experiencing a teenage pregnancy. Author uses propensity score to match each treated subject with one or more control subjects such that the treated subjects are, on average, identical to the control subjects on observable characteristics prior to treatment, and then compares individuals growing up in poor and nonpoor neighborhoods (treatment and comparison groups), who are otherwise identical on observable characteristics.

Conclusion

In conclusion, causal inference is a fundamental concept in research and is critical for making informed decisions in fields such as medicine, economics, and public policy. To establish causality, researchers can use both experimental and observational methods.
While experimental methods are considered the gold standard for establishing causality, they may not always be feasible or ethical.
Observational methods can be a valuable alternative, but they have limitations and cannot establish causality with the same degree of certainty as experimental methods.
- To overcome the limitations of observational methods, researchers can use advanced techniques such as instrumental variables, regression discontinuity, difference-in-differences, fixed effect and propensity score matching. These methods can help identify causal effects in observational studies, but they also have limitations and require careful consideration of assumptions and potential sources of bias.

References

Bailey, Michael. 2019. Real Econometrics: The Right Tools to Answer Important Questions. 2nd ed. Oxford Unversity Press.

Cox, David, and Nancy Reid. 2000. “The Theory of Design of Experiments,” January.

<<<<<<< HEAD

Dee, Thomas S. 2004. “Are There Civic Returns to Education?” Journal of Public Economics 88 (9): 1697–1720. https://doi.org/10.1016/j.jpubeco.2003.11.002.

=======

>>>>>>> 79ff760ac8d4ef50c9f6df841c4c6a85ec298efb Eissa, Nada, and Jeffrey B. Liebman. 1996. “Labor Supply Response to the Earned Income Tax Credit.” Quarterly Journal of Economics 111 (2): 605–37. https://doi.org/10.2307/2946689.

Gormley, William T., and Ted Gayer. 2005. “Promoting School Readiness in Oklahoma: An Evaluation of Tulsa’s Pre-k Program.” The Journal of Human Resources 40 (3): 533–58. https://www.jstor.org/stable/4129551.

Harding, David J. 2003. “Counterfactual Models of Neighborhood Effects: The Effect of Neighborhood Poverty on Dropping Out and Teenage Pregnancy.” American Journal of Sociology 109 (3): 676–719. https://doi.org/10.1086/379217.

Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association 81 (396): 945–60. https://doi.org/10.2307/2289064.

<<<<<<< HEAD

=======

Miguel, Edward, Shanker Satyanath, and Ernest Sergenti. 2004. “Economic Shocks and Civil Conflict: An Instrumental Variables Approach.” Journal of Political Economy 112 (4): 725–53. https://doi.org/10.1086/421174.

>>>>>>> 79ff760ac8d4ef50c9f6df841c4c6a85ec298efb Morgan, Stephen L., and Christopher Winship. 2014. Counterfactuals and Causal Inference: Methods and Principles for Social Research. 2nd ed. Cambridge University Press. https://doi.org/10.1017/CBO9781107587991.

Neal, Brady. 2020. “Introduction to Causal Inference,” December.

Pearl, Judea. 2019. “Confounding and Deconfounding: Or, Slaying the Lurking Variable.” In. Harlow, England: Penguin Books.

Remler, Dahlia K., and Gregg G. Van Ryzin. 2015. Research Methods in Practice: Strategies for Description and Causation. Second edition. Los Angeles: SAGE.

Rubin, Donald B. 1974. “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies.” Journal of Educational Psychology 66 (5): 688–701. https://doi.org/10.1037/h0037350.

Stock, James H., and Mark W. Watson. 2020. Introduction to Econometrics. Fourth edition, global edition. The Pearson Series in Economics. Harlow, England London New York Boston San Francisco Toronto Sydney Dubai Singapore Hong Kong Tokyo Seoul Taipei New Delhi Cape Town Sao Paulo Mexico City Madrid Amsterdam Munich Paris Milan: Pearson.

Waldfogel, Jane. 1997. “The Effect of Children on Women’s Wages.” American Sociological Review 62 (2): 209–17. https://doi.org/10.2307/2657300.

Wooldridge, Jeffrey M. 2016. Introductory Econometrics: A Modern Approach. Sixth edition, student edition. Boston, MA: Cengage Learning.